A new fault-tolerance framework for grid computing

نویسنده

  • Youcef Derbal
چکیده

Fault detection and propagation in a computational grid requires a comprehensive framework that takes in consideration the various grid environmental conditions such as the asynchronous nature of communication and the uncertainty on the disseminated fault information. The paper presents a fault-tolerance framework that provides the necessary models to manage the local faulty behavior associated with the operation of hosted services. The framework includes a quantification mechanism of the fault vulnerability of grid nodes and their hosted services. The resulting measures of fault vulnerability are globally disseminated to enable the synthesis of decentralized fault-tolerant decision making strategies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

رویکردی برای حفاظت از عملیات های پردازش داده در سیستم های محاسباتی با استفاده از کدهای کانولوشن

Abstract We present a framework for algorithm-based fault tolerance methods in the design of fault tolerant computing systems. The ABFT error detection technique relies on the comparison of parity values computed in two ways. The parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs. Number data proc...

متن کامل

An approach to fault detection and correction in design of systems using of Turbo ‎codes‎

We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...

متن کامل

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

Fault Tolerance in Grid – an Overview

-Grid computing has emerged as a distributed methodology that coordinates the resources that are spread in the heterogeneous distributed environment. The resources can be categorized as computational resources and storage resources A grid is composed of a collection of heterogeneous systems such as workstations, servers, computers that allows access to computing power, data sharing, memory use,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Multiagent and Grid Systems

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2006